DOCUMENT RESUME 

TM 850 202 

Fuch^:. Douglas; Fuchs, Lynn S. 

The oortance of Context in Testing: A 

HetL lalysis. 

Mar 85 

35p*? Paper presented at the Annual Meeting of the 
American Educational Research Association (69th, 
Chicago, IL, March 31-April 4, 1985). 
Speeches/Conference Papers (J 50) — Reports - 
Research/Technical (143) 

MF01/PC02 Plus Postage. 

Disabilities; Effect Size; ^Examiners; ^Experimenter 
Characteristics; *Individual Testing; Interpersonal 
Relationship; *Meta Analysis; Research Methodology; 
Scientific Methodology; Socioeconomic Status; ^Test 
Bias; Testing Problems 

♦Context Effect; Familiarity; Positivism 



This article presents a meta-analysis of the effects 
of examiner familiar ity/unfamiliarity on children's performance 
during individual testing. Data cane from 22 controlled studies 
involving 1489 subjects. In a typical study, the effect of examiner 
familiarity raised test performance by .35 standard deviations. 
Differential performance favoring the familiar examiner condition was 
greater when subjects: (1) were of low socioeconomic status; (2) were 
tested on comparatively difficult tests; and (i) knew the examiner 
for a relatively long duration. Th^ relationship of familiarity to 
examinee's handicapped status was not clear. The effects rf examiner 
familiarity demonstrate the importance of contextual factors in 
testing and question the positivistic view that the test instrument 
is the single most impd'rtant variable determining test performance. 
(Author/BS) 



*********************************************************************** 

* Reproductions supplie*^ by EDRS are the best that can be made * 

* from the original document. * 
*********************************************************************** 



ERIC 



ED 255 559 

AUTHOR 
TITLE 

PUB DATE 
NOTE 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 

IDENTIFIERS 
ABSTRACT 



r 



m 

The Importance of Context in Testing: 

O 

I ^ y A Meta-Anal ysi s 



Ci 



DocQias Fuchs and Lynn S. Fuchs 
Peabody Colleqe. Vanderbilt Uniwersity 



U Jl. f^^AHTMCIVT OP CDUCATfOM 
NATIONAL INSTrrUTE OF Ef>UCATlON 

EDUCATtONAL HfcSOliHCES JNFOBMATION 
CENTER (E«lC) 

M»rKM ch5f?Q<JS h«v«* txwi fntitift tu ;mpfOv«' 

merit do not nec«s**frty represent atf>t:k*l Hit 
pottiKin Of pohcy 



•PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 



INFORMATION CENTER (ERICV 



Requests for reprints should be sent to DouQlas Fuchs, 8ox 328. 
^* L'epsrtroent 0+ Special Education, Pej.Dody College. i^andertjMt Universitv 

Nas^v I ; 1 e . TN 5720 3. 



ERIC 



2 



Test Procedure 
2 

Abstract 

This article presents a meta-analysis of the effects of examiner familiarity 
on children's test performance. The data for the irota^analysis came from 22 
controlled studies involving 1^89 subjects. In the typical study» the effect 
of examiner familiarity raised test performance by .35 standard deviations. 
Differential performance favoring the familiar examiner condition was greater 
when subjects (a) were of low SES status, (b) were tested on comparatively 
difficult tests, and Cc) knew the examiner for a relatively long duration. 
Implications are discussed for sclentlsm, the popular eplstemologlcal basis 
for understanding testing, and for practice. 



ERIC 



3 



V 



Test Procedure 
3 

■ • 

The Importance of Context in Testing: A Meta-Analysis 

Positlvisin, or scientlsm, Is the epistomological basis for the mainstream 
tradition in the social sciences (Adorno, Albert, Dahrendorf, Haberwas, Pilot, 
& Popper, 1976; Bernstein, 1978). The positlvlstlc ideal Is the formulation 
of universal laws, which are free of the restraints of particular contexts, 
and therefore applicable to all* Hence, limiting, if not eliminating, con- 
textual influence is a key feature of our standard methods of experimental 
design, measurement, and statistical analysis (Mishler, 1979). 

Sclent i«m also appears to govern the manner in which we administer tests, 
as well as our understanding of what occurs during testing* Evidence for this 
may be found in the most recent draft of the Joint Technical Standards for 
Educational and Psychological Testing (AERA, APA, & NCME, 1984), where, on 
page 1, the test situation Is described as a formal experiment* This perspec- 
tive requires the examiner (i.e., unbiased investigator) to administer the 
test instrument acco-rding to explicit non^-vary inp, instructions (l,e* , experi- 
mental treatment) in a conf^olled setting (l#e* , laboratory)* As in all 
scientific endeavors, these attempts to obiectlfy and standardize the test 
situation are made, in part, to Isolate the variable of interest, the test, 
from other contextual or situational variables* By promoting the independence 
and importance of the test instrument, we attempt to demonstrate a cause and 
effect relationship between test performance and whatever examinee character- 
istic the test claims to measur*?* 

It is a fundamental presumption of the positlvlstlc perspective that we 
may conceptualize the test setting in this "decontextualized" manner; that 

er|c 4 



Test Procedure 
4 

extra-test factors can be controlled, their effects on performance neutral- 
ized* Specific related asauiaptions concerning the behavior of teat partici- 
pants are that (a) the examiner-examinee relationship is static, uni- 
directional, end predictable, vith the examiner controlling the testini; by 
manipulating materialSp questions, and feedback, while the examinee passively 
observes «*nd responds; (b) examiners objectively and reliably administer the 
instrument and score performance; (c) test developers and test participants 
share similar interpretations of important eleo^nts of testing, such as the 
purposes of testing and the meaning of test instructions; and (d) the examinee 
attends to variables in the test setting accorded importance by test construe-- 
tors and examiners, and ignores those stirmili to which examiners and develop- 
ers assign scant importance* 

It is testimony to positivism's powerful influence on testing that these 
assumptions Infrequently have been explored* Nevertheless, a growing corpus 
of empirical studies calls these assumptions into qt. :ion« First* this 
research suggests that examiners and examinees participate in dynamic, 
bl'-directional, and idiosyncratic relationships, resulting in unpredictable 
behavior (Fuchs, Zern, & Fuchs, 1983; Mehan, 1978; Roth, 1974)* Second, 
examiners' scoring may be influenced by pretest information on examinees 
(Babad, Mann, & Mar-Hayim, 1975; Fiscus, 1975; Hersh, 1971; Schroeder & 
Klelnsasser, 1972), as well as by examinee characteristics (Fuchs & Fuchs, 
1984; Masling, 1957). Third, test performance can be affected: (a) by 
examinees' interpretation of the purpose of testing (Deyhle, 1983; Goodnow, 
1976), comprehension of test instructions (Abramyan, 1977; MacKay, 1974; 
Mehan, 1978)^ anxiety (Sarason, 1980), and pretest contact with examiners 



« 



Test Procedure 
5 

(Fuchs, Fuchs, Power, & Dalley, in press); and (b) by exaiainera' personality 
(Exner, 1966; Feldraan & Sullivan, 1971; Sacks, 1952), reinforcement (Ayllon & 
Kelly, 1972; Taylor & White, 1981; Tiber & Kennedy, 1964), attitudes about the 
legitimacy of testing (Home & Garty, 1981), the order in which they admin- 
ister tests of varying difficulty (Zigler & Butterfleld, 1968), and their 
choice of test location (Labov, 1973; Seitz, Abelson, Levlne, & Zigler, 1975; 
Stoneman & Gibson, 1978). 

Such findings challenge positivism's decontextualized view of testing, 
and simultaneously corroborate a competing notion that contextual variables, 
including test participants' unique experiential backgrounds, mediate between 
the test instrument and performanre. Comparative re8t»nrch in cognition (see 
Cole & Means, 1981) corroborates this Idea and suggests further that various 
££uu££ of examinees may respond differently to contextual variables in 
assessmt-nt. If this wore true, then situational factors systematically may 
t-nhanco the performance of certain groups and/or consistently depress the 
pertormaiice of others. In such cases, situational varlaK --s would represent 
systematic sources of error or bias. 

Dt'Spire the possibility and Importance of such an occurrence, this tvpe 
of tost situation, or test procedure, bias generally has gone unexplored 
(Fl,iu«her, 1978). One of the few exceptions has been the issut of the effects 
ot examiner unf ami 1 iar 1 ty on Lest performance. Interest In this facet of the 
test procedure probably has been spurred by one or more of the followlnt^. 
First, examiner unf ami 1 1 arl t v ofter has been perceived as an important and 
desirable characteristic of standard testing (cf. Standards for Educational 



ERIC 



Test Procedure 
6 

a nd Psychological llpsts , 1974), thereby making it a conspicuous con^Jonent of 
the test procedure. Second, and in apparent contradiction, there is a long-- 
standing developmental notion that, because children derive much or their 
coai>rehension and feeling about a situation from significant adults in that 
setting (Freud, 1921/1922; Plaget, 1965), exaniiner attributes, as well as 
behaviors, are pivotal to examinee performance. Finally, psychological re- 
search into related but substantively different areas, such as the effective-- 
ness of adults' social reinforcement on children's performance (cf. Stevenson, 
1965), has demonstrated indirectly the Impor, nee of the tester's familiar ty/ 
unf ami llarity » 

Nevertheless, there has bean no previous quantitative integration of the 
effects of examiner unfamilisrlty on children's performance. Therefore, the 
purpose of the present study was to conduct a meta-analysis on this topic, 
specifically focusing on whether examiner unf aiCiliarlty exerts a bias against 
select subgroups, such as low-SES and handicapped children. 

Methodology 

Search Procedure 

The search for pertinent studies con^rised a five-step procedure. First, 
employing the Thesaurus of Psychological Index Terms (APA, 1982) » nniltlple 
descriptors were generated for key topic-related terms. For example, rapport 
alternately was Identified by "examiner-^examinee Interaction," "Interpersonal 
factors," and "situational factors." Second, in June 1982, the descriptors 
facilitated a computer search of three on-line data bases: ERIC (Educational 
Resources Infortpation Center, from 1966); Psych Info (Psychological Abstracts 
Information Service, from 1967); and Dissertation Abstracts International 



ERIC 



7 



Test Procedure 
7 

(from 1927) • Following Dusek and Joseph (1983), the descriptors were entered 
into the computer as Isolated words or phrases to promote a con?)arat Ively 
broad search. 

Third, employing slaillar key descriptors^ a manual search was conducted 
of 12 educational, psychological, and speech/language Journals for the years 
1965-1982^ inclusive. (If a Journal began publication after 1965, all of its 
volumes were explored.) These journals were: American Journal of Mental 
Deficiency , Chi ld Development , Developmental Psychology , Exceptional Children , 
Journal of Abnormal and Social Psychology ^ Journal of Consulting and Clinical 
Psychology , Journal of Experimental Child Psychology , Journal of Gpnetic 
Psychology , Journal of Speech and Hearing Disorders , Language, Speech, and 
Hearing in the Schools , Merrill Palmer Quarterly , and Psychology in the 
Schools . Fourth, the reference sectlonn were explored for selected textbooks 
on psychological and educational assessment, such as Sattler's (1974) Assess- 
nent of Chlldren'R AhilltieB . Finally, titles in the references of investiga- 
tions discovered by these efforts were pursMed. 

Cr i ivr la for Relevant Studies 

A study was con.^idered for inclusion if it compared examiner familiarity 
to unf ami liarity in terms of effects on examinees' performance during in- 
dividualized testing. For reasons discussed by Cooper (1982), "familiarity" 
was defined broadly, including; either children's personal acquaintanceship 
wUh thi^ examiner or their prior contact with a rather wel l--def ined class of 
.^rlults, such as white middle-class females, of which the examiner was a mem- 
ber. "Test performance" was defined as examinees' performance on one or more 
h;, snet>rh/la!i«ua^e, or educational .achievement test, or on experimental tasks 



8 



Test Procedure 
8 

meant to simulate test Items found in 8uch measures. This definition of test 
performance helps to distinguish the studies in the present review from those 
that describe determinants of children's responsiveness to adults* social re-* 
inforcement (cf« Stevenson, 1965). In similar fashion to some of the invest!-* 
gatlons under review, the social reinf orcen^nt literature eicplores the effects 
of negative, positive, and an absence of prior contact with an experimenter on 
children's performance. However, these studies typically employ persistence 
and/or rate of performance on relatively simple motoric tasks, such as marble 
dropping (cf. Stevenson & Kennedy, 1966) or underlining S^s (e.g., Rosenkrantz & 
Van De Riet, 1974). We believe such tasks are fundamentally different from the 
more complex and demanding requirements In IQ, speech/language, and educational 
achievement assessments, and probably contribute to a qualitatively different 
experience for test participants. The resulting saraple Included 24 studies of 
the effects of examiner f ami Uarl t y/unf ami Ilarl ty on children's test perform- 
ance. 

Data Pxtract^^d from Each St 

The effects of examiner familiarity and examiner unf amiliarlty were noted 
In each study. Effects tor five studies were unclear and, in each case, an 
attempt was made to obtain additional information from the investigator. One 
researcher could not be reached and one did not respond, reducing the sample 
from 24 to 22 studies (see the Appendix) • Many of the 22 studies reported more 
than one effect* In such instances, each effect was coded separately. In all, 
the 22 studies yielded 38 effects of examiner f ami llarlty/unf ami liari ty. 

Effects of examiner farlllarlty and unf anilliarit y were related to one com- 
posite procedural variahle and nine substantive variables. The composite pro- 



Test Procedure 
9 



cedural variable Indicates the overall irethodological quality of each investi- 
gation* It was based on an aggrr Ration of nine design-related characteristics* 
These methodological characteristics, as well as the standards against which 
they were judged to generate an overall quality index, follow: 

1. Assignment of subjects to examiners ^ It was necessary for subjects 
in he assigned randomly to examinees* 

2* Assignment of subjects to treatments * Investigators were required 
to assign subjects randomly to experimental conditions^ or to use a repeated 
measures design. 

!• Examiner expectancy . Researchers were expect<*d to insure that 
t^xamlners were blind to the general experimental questions and, specifically, 
to the f ami llar/unf ami liar nature of the test conditions* 

4, Fidelity of treatment conditions * Investigators employing a personal 
acqiialntanc^^shlp definition of famlll.^rltv were required to make explicit that 
unfamiliar examiners were strangers to examinees and that ex^imlner familiarity 
either represented a long-*term arqnaintanreship between test participants or 
was the resultant of an experimentally-Induced procedure. 

Multiple treatment effects . Studies were evaluated as acceptable 
when effects of the familiar/unfamiliar examiner conditions did not appear to 
be confounded with other factors, such a« the gender of familiar and unfamll- 
i ar testers* 

6# Number of examiners . it was judged Important that there be a minimum 
ol two familiar and two unfamiliar examiners. 

?• Order of testing . Studies employing a repeated measures riesi^n were 
roqiiired to counterbalance '.»sting in familiar and unfamiliar examiner condi- 
t i ons» 



ERIC 




Te^5t Procedure 
10 



8, Scorlng> It was necessary chat scores be calculated by a blind 



procedure. 



9. Technical adequacy of dependent c^asure ^ At a mlnltaum^ a study was 



eKpected to use measures with Indices for Internal or test-retest reliability 
exceeding •69. 

Interrater agreement on each of these dimensions, based on two rat rs' 
scores on six randomly selected studies (26% of the sample) , ranged from ,67 
to 1*00. Average asfreement across ail nine methodologic il characteristics 
v^as .83. 

The substantive variables noted In each study included the following: 

1p Duration of treatment . This refers to the amount of time in which 
either (a) examiners and examinees became personally acquainted or (b) exaro- 
ineen became familiar with a type of examiner. We stratified the duration 
of the acquaintanceship period into five levels, ranging from less than 16 
minuter to more than 20 hours. This stratification does not distinguish 
hofween lon(?'-term familiarity (suc^ as exists between teacher and student) 
and e^nt>rimentally-induced familiarity. 

2. Examiners' professional f ami liarlty with subjects . Examiners were 
4*Lissifled as "professionally familiar" with subjects if they had previous 
experience with a typo ot child of which siihiects were exemplars. Examiners 
wert* identified as "professionally unfamiliar" if they had no prior experl- 
tMu^p with c\ group of children of which subjects were members. 

}. Kxcimi ners ' t ral nl ng > A distinL*tion was made between examiners 
who wt're trained formally as professional testers (e.j?., school psycholo-^ 
kU Ht-*^ ^peec^1 r 1 i ?i f c i .inf4) .int1 thosi» who were !u>t (t*.^;., cla^^sroom 

tt'tirJiers Mnd m'j(hers)» 




ERIC 



Test Procedure 
U 



4. Familiarity-Inducing activity . This refers to whether the examiner 
interacted with or sinqjly observed the examinee during the familiarizing 
phase of the study. Long-teria acquaintanceship always was defined as 
Interactional In nature. 

Handicapped sratus c. Subjects were identified as either handicapped 
or nonhandlcapped. No distinction was made with respect to specific cate- 
gories of exceptionality (e.g., mental retardation vs. learning disabilities) 
or to degree of handicapping condition (e.g., mild vs. profound). 

6. Subjects^ CA . Subjects' CA, ranging from 2 to 16 years, was con- 
verted Into months and treated as a continuous variable. 

7. Subjects* SKS . Initially, suhjucts' SES was classified in terras of 
either (a) poverty level, (b) mix of poverty level and working class, (c) 
middle-class, or (d) upper middies-class. For purposes of analysis, a^ and h 
were cril lapsed, as were £ and jl, cremating two SP^S categories: low and hlj^h. 

B. Test Inca!: ion . Location was classified as either familiar or un- 
familiar to the examinee. 

^* Type of test . Dependent variables were classified as IQ tests, 
speech/ lauKua^^e tests, or isolated tasks, whlc:h were taken from, or created 
t ;> closely resemble certain dimensions of IQ, speech/ lan^age, or educational 
arh 1 evement tests. 

As a reliability check, two raters independently coded the nine substan- 
Llve characteristics in six randomly selected studies {2bZ of the sample). 
Interrar -eemenr tor each of the study features ranged from .67 to 1.00. 

Averrj^r lent acruss all nine substantive variables was .93. 



ERIC 




Test Procedure 
12 

Characteristics of the Sample 

Of th^ 22 investigations included in this review, 18 were published 
studies and 4 were unpublished studies. Among the puLLished articles^ 17 
appeared in 14 different Journals; I study was published in a book. Three 
of the 4 unpublished investigations were doctoral dissertations; 1 study 
was Included in the proceedings of a conference* Nineteen of the 22 studies 
were dated after 1970; the earliest was dated 1929» Also, 19 of the 22 
studies defined examiner familiarity in terms of an examinee* fsi personal 
acquaintanceship with the examiner; in 3 investigations examinees became 
familiar with a type of examiner, of which their eventual tester was an 
exemplar* Among the 19 investlgatluub employing a personal acquaintanceship 
definition of f ami Ila)^ Ity , exanilners and examinees were long-term acquain- 
tances In 8 studies, familiarity was cxperlnjentally Induced In 10 invesfciga- 
ti.^ns, and, In 1 stndy» the procedure facilitating personal familiarity was 
unt^lear* A total of 1489 subjects part icloated in these studies* Thirty-two 
percfitu of the subjects vere male; 30% were female. Researchers did not 
report the sex of of Lhe subjects* 

Rest! Its 

< )ve r. \ 1 1 Kt f ectH 

Results of t)u» 22 studies Wt^rt* oombintni to provide three interrelated 
a>^>MrMrate descrl pt 1 ojin of thv offects of examiner f ami ll^irHy : unbiased 
t'f fect size, perctuUr^^^e of d* strihuL lt)n nonoverlap, and me ta-anal;, :ic 

i'nhl/iserl effect sl^e > A n>ean effect size was rferlved by determining the 
st/^Hil/^rd rnean dlfterence between examinees' scores in the familiar and un- 
t.imlli ir examiner ri)n{iltions ami flivlcHn^'. this difference by the stanciard de- 



13 



Test Procedure 
13 

viatlon of the examinees' scores in the unfamiliar condition (see Glass^ McGaw» 
& Smith, 1981) • Before averaging effect size8» each one was converted to au 
unbiased effect size (UES) to correct for the inconsistency in estimatino^ 
true from observed effect sizes (Hedges, 1981). The mean difference between 
the biased and unbiased effect sizes was small (X - #019, SE * •005), as has 
been demonstrated elsewhere (e»g. , Bangert-Drowns, Kullk, & Kulik, 1983). 
Nevertheless, the UES was employed in all analyses to insure mathematical 
tractability of the data. For purposes of analysis, an effect was given a 
nosltlve sign if examinees achieved higher scores in the familiar condition* 

For 32 of 38 effect sizes In the sample, examiner familiarity had a 
positive Impact on test performance; t> etfect sizes indicated the effect 
of examiner familiarity was negative. The average UES was .35 (SD ^ .47; 
SK = .076), 1(37) - 4.67, £ < .001. 

Perc entage of distribution nonoverlap . The percentage of distrilxition 
nonoverlap, or U3 statistic (Cohen, 1977), denotes the percentage of the 
group with the smaller mean that Is exceeded by 50% of the people in the 
larger-meaned group. The U3 statistic indicated that the upper 50% of the 
distribution of scores In the familiar examiner condition exceeded bU% of 
the distribution of scores in the unfamiliar examiner condition. Given an 10 
test with a populacion mean of 100 and ,1 standard deviation of 15, the use of 
a familiar examiner would raise the typicax score from 100 to 105.25, or from 
the 50th to approximately the 64th percentile. * 

Meta-an/*lyt ic Z> Results from the 22 studies were combined to determine 
the unweighted Stoufter meta-analytic Z^ja (Rosenthal, 1978). This statistic 
(>f*rmits computation of the probability that the combined effect of children's 



14 



Test Procedure 
14 

greater performance in the familiar examiner condition would occur by chance. 
It was derived by changing the £ values of all effects to scores, summing 
them, and dividing this sum by the square root of the number of studies 
included* When calculating a £ score for studies in which multiple dependent 
variables were analyzed, a median £ value was calculated for each study and 
its associated £ score was used in the meta-analysis (see Rosenthal & Rubin, 
1978). The resulting Zjjja was 7.20, £ < .001. 

Credence in a statistically reliable meta-analytic may be compromised 
hy the Suspicion that researchers do not report nonsignificant '•esults 
(Creenwald, 1975). Rosenthal (1979) described a irethod for determining the 
numb^^r of unreootted null effects thMt would be needed to reduce a meta- 
analvtlc^ tu nonslgnlf Icance. The larger this "fall-safe the more con- 
fidence one can have In the reliability of a meta-analytic result. This in-- 
vest Igatlon's fiill-sife N was 418. As a rule of thumb, Rosenthal suggested 
that c4 met a-ana lytic 7^ be regarded as resistant to the "file drawer problem** 
of unreported null results if the fail-safe N exceec'!^ 5K 10, where k Is the 
nnmher ot reported effects* In the current study this requisite number was 
2ns, Thus, the? fHiI--hafe H of 41R was more thrin twice as large. 

Ro 1 rfon ht>twc>en n^^S and Study Chararterlst Irs 

Mf>thf>dolofilc^ 1 qtwUlty of studl<?s # T!je nie t hodoloi^i ra 1 quality of each 
of the .^2 studies was quantified employing a four^step procedure. First, 
every { nvp.st iK''^t i on was analysed in t^rms of the nine dt^sign-related charac-- 
teristics and criteria described above. These desif^n features were coded 
acceptable (0 points) unacceptable (1 point), or not applicable. As men- 
tloni-.i, the mr'afi interrater agreement for the codlnj^s across the nine 



15 



Test Procedure: 
J 5 

methodological characteristics was ^83. Second, a weight of J or 2 was 
assigned to each methodological characterlBtic. "Technical adequacy of 
dependent measure,'* •'assignment of subjects to t reatmencs,** and "assignment 
of subjects to examiners'' received a weight of 2; the r^malnlpg six design 
characteristics received a weight oT I* Third, a composite score was gen- 
erated for each study by imiltiplylng the coded values (0 or ^> by the 
assigned weights (1 or 2), suroralng these products, and then dividing the sum 
by the number of applicable study characteristics* Finally, a frequency dis-- 
trihution of these composite scores was generated It xncicated that 55% and 
4'>'{ '>f . ^estlgatlons received composite scores above •? (low quallly) 

and below . (high quality), respective ly. 

Twenty-one effect sizes we :e assigned the statLis of low quality, with 
an average effern size of ,51 liD = .50), 17 effect uizes /^ere assigned the 
.status of hl^h qual/.ty, wlt\ c avera^^e effect Sx. of .17 (SD ^ .37). The 
CijrreLitlon between the studiet* quality ratings and U^iJs was -.38 (£ < .05). 

Substantive featu r es of studies . Analyse ; were conducted to determine 
whether substantive features of the studies mediated the findings of the 
me ta-analysis. Correlations were run to determiiie which of the .suhst^^nt i vc 
variable?; were related to examiner familiarity outcomes. Table 1 displays 
t f?e means and t;tandard devlatlfMis of the IIKSs> and correlat ions of t!u» UFSs 
with the nine substantive featuri^s coded in the me ta-ana Ivs i s. 



Insert Table i about here 



Three f)f the 9 substantive variables correlated s Inni f tcant ly aud moder-- 
atelv with llES: Duration of Familiarity, SKS, and Type nf Test (see Table 1). 



16 



Test Procedure 
16 

These correlations indicated that stronger performance with the familiar 
exanlner was related to (a) examiner-examinee familiarity of comparatively 
long duration, (b) examinees* low SES status, and (c) relatively demanding 
teste. A substantive feature correlating in weak fashion with UES was 
Kxaiuiners' Prof e:i8lonal Training (see Table 1). 

Duration of Familiarity, SES, and Type of Test were entered as predictor 
«/^riablea into a forward stepwise imiltiple regression* Subjects' CA also was 
employed as predictor because, among the remaining substantive variables. 
It demonstrated the highest correlation (£ « .21) and claimed 38 effect 
sizes. These four predlc^or variables correlated weakly among themselves; 
correlations rangra trom .'52 to -.nj, with a median correlation coefficient 
of .12. 

Each of t'i.e e^iuations, displayed in Table 2, indicate that the prpdlctor 
s^ai Ubl**s w.^re s t^^t ist ici lly significant in explaining the variance In the 
HKS. In the last equation, incorporating all four variables, SKS, Duration 
of Kami llaritv, CA» and Type of Test explained 22%, 8%, 7%, and 5% of the 
var i .im-t:*, respectively. Howtwer, the regresr.lon was calculated on a rela- 
riv.'U small number of ettect hIzos .ind, as a conseq iicm^ta, findings may be 
iinsr ibic (KerUn>^>r t» Pedhazur, 1973)* Thus, in sunm£irizinf^ and decomposing 
t fu^ linoar JepLMidtMiry f)f the l!KS ov tho four predictor variahlt^s, results 
t r )f^ I hi' n^^ive^^l^'ii) sh<n^\^ be vic^wed as a heuristic addition to the foregoing 
t'or ft' 1 .1 1 i niir) 1 r] O4J I V s is» 



Insert Table 2 about here 



17 



Test Procedure 
17 



Di scussion 



This meta-analysis indicated that examinees achieve higher scores when 
tested by familiar than unfamiliar examiners. The magnitude of this differ* 
ential performance was both statistically and practically significant. Hov- 
ever, caution should be exercised in interpreting examinees' strong^er per- 
formance in the familiar examiner condition because larger effecc sizes were 
associated with studies of relatively weak methodologies* Mditionally, it 
is unclear whether^ and if so to what extent^ these results are robust* 
Although examinees' higher scores with familiar exaiuiners appeared unrelated 
to whether testers were professionally trained or not, the low number of 

effect sizes associated with trained « 3) and untrained « 8) testers 

I'. 

undermines confidence in this correlation. Similarly, we are unable to 
determine possible moderating or mediating effects of examiners' profes- 
sional familiarity/ unf ami liarity with the group of children of which the 
examinee was a member. This is because only one study reported a controlled 
contrast of this examiner-related characteristic. 

On the other hand, duration of the familiarity-inducing activity was 
dSi?ociattjd in a strong, positive fashion with effect size* This relation 
suggests examiner familiarity is a legitimate and important construct* In 
addition to duration o( familiarity, the nature of the test instrument seemed 
to mediate examinees* differential performance; Examinees performed stronger 
in the familiar condition when tested on. a difficult measure (e.g*, an IQ 
tt^st); however, such differential performance lessened when the measure was 
comparatively simple (e»g,, a speech test)* This result is consonant with 
empirical evidence in the social reinforcement literature, which suggests 
prior contact with an experimenter increases the level of subjects' respond- 




ERIC 



Test Procedure 
18 

ing on coay)lext but not on simple tasks (Crow, I96A; Rosenkrantz & Van De 
Riet, 1974). Rosenthal (1980) has suggested an explanation for this pattern 
of findings: Examiner unf ami liarlty engenders anxiety in examlneeSy and 
whereas this anxiety enhances motivation to do well on simple tasks, it in- 
terferes with the higher order thinking required by complex tasks. ThuSi 
examiner familiarity is presumed to vitiate examinees' anxiety and its nega- 
tive influence on cooq^lex task performance. 

The most important subject variable to Intercede between examiner famil-- 
larity and test performance was SES. Correlational analysis indicated that 
low SKS children's differential performance In favor of the familiar examiner 
was t^reater than that ut high SES tin iilt»-n» This result suggests examiner 
unf aiT'l 1 larlcv selectively depresses the scores of low SKS children. 

Knhanrln^ the. Importance of this finding is that most examiners In 
rli* iivil anti oHnr/ittonal settinf^s ;ir^^ strangers to the children they test. 
Tht h.iH hv.on suhsLaiuLued directly by reports of practicing professionals 
(Fufhs, l^)Ml), Indirect evidence c<»mes from an analysis (Fuchs, Fuchs, 
Hallt^v, .s Power, dt rhv usor m^inuals of 20 well^-known intelligence and 

s i>:'riMt/ I :ui)^,tia>!P me<isia>: Only 2 maniu^is sui^fyested that examiners estab^ 
1 i <h urt^tt^-^t contact wi : h cht*fr t»Xtnni iu»r^t Mort*ovt;r, tht^ St and^irds for K<iu- 
4 jri,.!);il I'J^ V^'ln> Tt^s (\^/\) seem to discourdk^e nxainlner 

? imi 1 i irir.v', ah r v*t 1 t*ct t*.i in .i call f^.c " liP|><»rsona 1 " prorintiires (p» 64) and 
in .'I r<>tMn]mcni1.u i {)n ihar rcsrcrs "niinimi'^e" (p. 6 ?) any ^^ffect they may have 
u'; r^ihircn^s prffor^* ic Tfiert^t ort' , on normative tests, the siiboptlmal 

j)prtornancc (^t Low .SIS childriMi may he compared to tho maximal pert'orraance of 



19 



Test Procedure 
19 



other groups, such as high SES examinees* If so, examiner familiarity is a 
source of systematic error or bias. 

Our findings of apparent test procedure bias may explain at least par- 
tially why, on average, low SES children obtain lower IQ scores than high SES 
children, a phenomenon first described by Blnet (see Llppmann, 1976) and re* 
peatedly corroborated since then (e#g« , Masland, Sarason, & Gladwin, 197b; 
Tyler, 1965). A frequent estimate of the magnitude of this difference in 10 
performance has been one standard deviation (e.g., Christiansen & Llvermore, 
19 70; Jensen, 1970). Low SES children's test performance conventionally has 
been interpreted as a rather straightforward demonstration of those skills and 
abilities that the tests claim to measurij. Typically, their comparatively 
poor showing on these tests has been attributed primarily to either poor genes 
or a disadvantaged environment (see Nichols, 1978). 

Nevertheless, current findings question such Interpretations that pre- 
sume a cause and effect relation between children's cognitive processes and 
their performance on tests that purportedly measure salient cognitive and/or 
academlr abilities. Our results indicate that at least one extra-test factor, 
examiner unf ami 1 iarlcy , also affects the performance of select groups of chil- 
dren. For low SES pupils, the effect size associated with examiner familiar- 
ity was •'>3, which is the equivalent of a difference of approximately 8 points 
on a standardixt^d IQ test with a mean of 100 and standard deviation of IS. 
Furthermore, as mentioned above, a i^^^wlng liternture suggests there may be 
additional contextual variables constituting the typical test situation, which 
influence certain pupils^ performance. Thus, one legitimately might wontit^r 
flow much of the reported difference between low and high SKS children's 10 



ERIC 




Test Proceduri! 
20 

performance may be explained by differential responses ttj contextual vari- 
ables* Until we know the f aswer to such a question^ attributing this discrep-- 
ancy to a difference in the group's ability level seems precipitous. 

Although » iiects' SES was related strongly to UES, their handicapped/ 
nonhamU ^ ad status was not* However^ this findi: may be misleading* 
Amo^^ te relatively few studies employing handicapped subjects, speech and/ 
or iangiiage-lmpalred children consistently perforn^d more strongly with the 
familiar examiner, wheii^as mentally retarded children either performed 
stronger with the unfamiliar examiner or did not demonstrate differential 
performance. Thus, by combining results from the few investigations involving 
sj>et?ch and/or language- Impaired, mentally retarded, and other handicapped 
i hildren, this meta-^analysis may be masking possible interaction effects be- 
tweiMi type of handicap f^nd the f ami liari ty/unf ami liarity of the examiner* 
Fiiturr r<!StMrc:h might experimentally tost such a possibility* 

In sum, tht* efffrtH of examlnt»r familiarity demonstrate the importance 
of context nal factors In testing* Such factors seem to Intercede between 
rtu» fi'st 'ind pi^rf cirmwnce, questloniiU^ the positlvistic v ew that the test 
inKtruiniMit is the sin^Ot^ most tmport/int, if not thi* oxcluslVG, variable to 
jHt^Mmiin* tiy^t [)erfi)rmance* Altfjonpji this proposition rontrridicts tradition- 
al r. hliiMin'. cibnut Ihv ia^l situation, it Is not iiew# Hare than a decade ay^n^ 
Crnuhiich (197) ) statcMj t hat thv tost is only orn^ elrnipnt In ^ procedure, and 
i.ht> validilv of data i)hlaint»d in e-durat i oHri I aud i)syc:lu>l(3gi cal assessment is 
,|,.p^ uliMit upon the prai-eiiurc? as a whole, MuWevtM*, adopLlni', tfiis perspective 
will dirtirult, \i not on i v c umfU 1 rales 1 ntorpretat 1(mi of tesf, j)erform- 
j\ als(j (M'esuru'ii t'n* oKistiuu'i' ot an adequate data base on contextual 



21 



Test Procedure 
21 

effects, which has yet to be developed* Nevertheless, accuracy In interpret- 
ing test results requires that we acknowledge the importance of context in 
assessment and continue the challenging task of defining the relation between 
situational factors and test performance. 



22 



BEST COf T « . 



Test Procedure 
22 



References 

Abramyan, L.A» (1977). On the role of verbal instructions in the direction 
of voluntary movenwsnts in children. Quarterly Newsletter of the Insti- 
tute for Comparative Human Development , 1^, 1-4. 

Adorno, T.W. , Albert, H. , Dahrendorf, R., Habermas, J., Pilot, H., & Popper, 
K.R* (1976). The positivist dispute in German sociology * New York: 
Harper & Row. 

American Educational Research Association, American Psychological Associa^ 
tion, & National Council on Measurement in Education, (1974). Stan- 

dards fo r edui/it . ionnl and psyoHologlcal tests . Washington, DC: 
AimTicHn Psychological Association. 
\ruM-i.',in Kdao/U tonal Research Association, American Psychological Associa- 

lion, h Natlofial Cmin^^ll on Mt asnrei^rent in Education. (1984, February). 
Dr.it t r Joint t ^M-hntc^l Btandards for educational and psychological 
t,>srln£ (Avjil.ihlr from APA , i)fni:e of Scientific Affairs, 1200 17th 
5;r., Washin^aon, DC 2nt}](y). 

U'llofi, T,, & Kelly, Kfterts m! r t; 1 nt orooment on s tandardlzi'd test 

pt-i f ur n:nu't% ( r.') . kuirna l ^^r Applit*d Rtf'havlor An,ily.sls , 5^, 

Xntr t -an V}\rc))n]t)\:\r:^\ \ ^ Of i i f i >iu (l^'^8^). rh^' sinrus o f p svrhologi (^a 1 
Mwif • liifh^ ( h'J k'tU V',jshin»»f )n, i)C; AuJhor. 

''V. , Mann, M. , ^ Mnr-llavlni, fl. (1975). Bia^ In the scoring of the 
.ISC Kuhrpsfs. ioH rni I or Consulting and Clinical. Pfsycholojy y ^3 , 268. 



23 



dst Procedure 
23 

Ban^.ert-Drowns, R.L. , Kulik, J. A., & Kullk, C.C, (1983)^ Effects of coach- 
ing programs on achievement test perfonaance* Review of Educational 
Research , 53 , 571-585* 

Beristeln, R^J, (1978)» The restructuring of social and political theory » 
Philadelphia: University of Pennsylvania Press* 

Christiansent T. , & Livermore, G* (1970)» A comparison of Anglo-Aaer Ijan 
and Spanish-American children on the WISC* Journal of Social Psychol- 
ogy > 81, 9-14. 

Cohen, J* (1977). Statistical power analysis for the behavioral sciences . 

New York: Academic Press* 
Cole, M., & Means, B. (1981). Comparative studies of how people think . 

Cambridge, MA; Harvard University Press. 
Cooper, lUM. (1982). Scientific guidelines for conducting Integrative 

research reviews. Review of Educational Research , 52 , 291-302. 
Cronbach, L.J. (1971). Test validation. In R.L. Thorndlke (Ed. ) , Educa- 

tlonal measurement: (pp. 441-507). WaKhln^ton, DC: American Council 

on Education. 

Crow, L. (1964). Public attitudes and expectations as a disturbing variable 
In experimentation and theory . Unpublished manuscript, Harvard Univer- 
sity. 

Deyhle, I). (1981). Learning failure: Test-taking and the Navajo student. 

[ Summa ry ] . Proceedings of the Fourth Annual IJnlverf^lty of Pennsylvania 

Ethnography in Education Research Forum , 5. 
DuBek, J.B.» & Joseph, C. (1983). The bases of teacher expectations: A 

meta-analysis. Journal of Educational Rp>search, 75, 327--346. 



ERLC 



24 



^^"^ m AVAIIABLE 

Test Procedure 
24 

Exner, J.E. (1966). Variations in WISC performances as influenced by 

differences in pre-test rapport. Journal of General Psychology , 74 , 
299-306. 

Feldman, S.E«, & Sullivan, D«S. (1971}« Factors iz^diatlng the effects of 
enhanced rapport on children's performance. Journal of Consulting and 
Clinical Psychology , 36 , 302. 

Flscus, E.G. (1975). The effects of pre-test information on sch ol psychol- 
ogists* scoring of ihe Wechsler Intelligence Scale for Children* 
Dissertation Abstracts Ititernational , 36 , 1387A. (University Micro- 
rilmn No. 75-19-4)^). 

Pl.uHThor, R. (197M). The many deflfiUicns of test blan. American Psychol- 
o;;ist , 21» 67l-h;M. 

KrcuJ, S. (1^22). (^rou p psyrholofyy and the analysis of the ego > London: 
Ini rniatiofwa Ps vchi^Ana ly 1 1 ca 1 PreHti. (Ori^Unally pnhlished, 1921). 

Ktulis, 1). (HHl, April). Dit r^^rtMit i.nl responses of preschool lanf^age- 

liamlicapped rhildrcMi and famili/ir and unfamiliar testers as a function 
nf t isk i^mplt'xiry, length of arqualntjnceship, and si x ot child. In 
^\ shlnmai) (C:!..4ir), C1 1 en!: 1 Himw 1 1 1 c;U i on an ri issnrs of vaUdity: The 
j nt lu efire of s i r u^t { ona I var l .ihleh on chlidfen^s eo?;nltlve performance . 
!>v'mpi>s i un pr4'st?ntiul .it the annii.il meeting ot the Amerlmn Kducatlonnl 
:^»Sf n rh A'- soi' i -it i on, Lf»s Aiu'cles. 

fMji'h.s, I)., Kufris, I.S., 0;nl*'v, A.M., ^ Power, M.M. (M^Hi)* Kt tects of pre- 
I r- ;t t ontni- t with t* \f)e i i « ' need inexperienced ex/iminers on handicapped 

rn i ldr<fn^s test ptT t orn.iiu'e fKesr^irch Report No. IH)). ^f^ nneapoli s: 



25 



BEST COPY AVAIUBLE ' ■ 

Test Procedure 
25 

University of Minnesota^ Inrtit'.rle for Research on Learning Disabili-- 
ties* 

Fuchs, D, , Fuchs, L»S., Power, M.H*, & Dailey^ A.M* (if. >re8s). Bia^ in 
the assessment of handicapped children* American Educational Research 
Journal t 

Fuchs, D. , Zern, D.S., & Fuchs, L.S. (1983). Participants' verbal and 
nonverbal behavior in faiailiar and unfamiliar test conditions. 
Diagnostique , R^, 159-169t 

Fuchs, LpS., Sf Fuchs, D. (1984)t Examiner accuracy during protocol comple- 
tion* Journal of Psychoeducat lonal Assessgent , 2^, 101-108* 

Class, G.V*, McGaw, B. , & Smith, M.L. (1981). Meta-analysis in social 
research . Beverly liills, CA: Sage* 

Coodnow, J. (197&). The nature of Intelligent behavior: Oiiestions raised 
by cross-cultural research. In L.H. Resnick (Ed.), The nature of 
1 nte 1 licence . Hi 1 Isda le, NJ : Er Ibaum. 

Gre't-nwald, A.G. (1975). Consequences of prejudice against the null 
hypothesis. Psychological Bulletin , 82 , 1-20* 

H(Mist?s, L. (1981). Distribution theory for Glass's estimator of effect si2:e 
and related estimators* Journa 1 of Edu cat tonal Statist ics ^ _6, 359--3A1. 

Hersh, J.B. (1971). Effects of referral information on testers. Journal of 
Consulting and Clinical Psychology , 37 , 116--122* 

Home, L.V., & Garty, M.K. (1981, April). What the test score really re- 
flects: Ohservations of teacher behavior during standardized achieve- 
ment test administration . Paper presented at the annual meeting of the 
American Educational Research Association, Los Angeles. 



26 



BEST copy AVAIUBLE ' Test Procedure 

26 

Jensen, A.R. (1970). Learning ability, intelligence, and educability. In 
V.L. Allen (Ed.), Psychological factors iu poverty (pp. 106-132). New 
York: Academic Press. 
Kerlinger, F.N. . & Pedhazur. E.J. (1973). Multiple regression in behavioral 
research . New York: Holt, Rinehart, & Wlnrron. 

Labov, W. (1973). The logic of nonstandard English. In F. Williams (Ed.), 
Language and poverty . Chicago: Markhatp. 

Llppmann, W. (1976). Tests of hereditary intelligence. In N.J. Block & G. 
l)workin (Eds.), The IQ controversy (pp. 21-29). New York: Pantheon 
Books. (Reprinted from Popular ScVence Monthly , May, 1915). 

MacKay, R. (1974). Standardized tesr.s: Objective and obje-tivlzed 

•neasures. In A.V. Cicourel et al. (Eds.), Langitage use and school 
performance (pp. 218-247). New York: Academic Press. 

Ma-jland, R.L., SaraHon. S.B., niadwln, T. (1978). Mental subnormality . 
New York: Basio Books. 

MasUni^., J.M. (1937). The effects uf warm and cold interaction on the in- 
terpretation of a projective protocol. Journal of Projective Tech- 

Mf'han, H. (197?'). St nu- tiir i nM H.hon i structure. i kirvard Kduc-tt lonal 

Mi ,:,l..r, K.G. Mi-ai inf; in cont.^xt: Is tiien? anv other kind? 

M.ir\/ard Kduc^U: i t>nn 1 KcvU'W , 49., 1-19. 
Ni h..l;, R.C. (i97H). Policy imp 1 Icit Ions of the 10 controversy. Review 

i,t Rt 'K<'arch to Kducatton , _6, 3-4 
Vi.ifW^, ii'i^"))' The moral ludf^mtMit of the child . Now York: Free Press. 



RJC 



27 



Test Procedure 
27 

Rosenkrantz, A.L. , & Van De Riet, V. (1974). The influence of prior contact 
between child subjects and adult experimanters on subsequent child per- 
formance. Journal of Genetic Psychology , 124 , 79-90. 

Rosenthal, R. (1978). Combining results of independent studies. Psycho- 
logical Bulletin , 85, 185-193. 

Rosenthal, R. (1979). The "file drawer problem" and tolerance for null 
effects.'^ Psychological Bulletin , 86, 638-641. 

Rosenthal, R. (1980). Experimenter e ffects in behavioral research (Enlarged 
paperback edition). New York: Irvington. 

Rosenthal, R. , & Rubin, D.B. (1978). Interpersonal expectancy effects: The 
first 345 studies. The Behavioral and Brain Sciences , 3^, 377-415. 

Roth, D.R. (1974). Intelligence testing as a social activity. In A.V. 
Cicoureljjt al. (Eds.), Language use and school performance (pp. 
143-217). New York: Academic Press. 

Sacks, K.L. (1^52). Intelligence scores as a function ofl experimentally 
e.stabltahed sociaj relationships between child and examiner. Journal 
of Abnormal ajti^^ctal Psychology , 47 , 354-358. 




Sara<^on, ]^^/^{Ed,). (1980). Test anxiety: Theory, research, and appll- 

catTt)ns . Hillsdale, NJ: Erl^f.ura. 
Sactier, J.M. (1974). AsaessiPcnt of children's perfor mance. >'h i ladelphla: 

Saunders. 

^ ±^ 

Schroeder, H.H. kfeinsasser, L.I)."" l) -iH)", Examiner biae: A de'cermmant 

of children's verbal behavior on the WISC. Journal of Consulting and 

Clinical Psychqlogv, 39, 451-454. 



28 



BEST copy mUMLt 

Test Procedure 
28 

Seitz, v., Abelson, W.D* , Levine, E. , & Zigler, E. (1975), Effects of place 
of testing on the Peabody Picture Vocabulan^ Test scores of disadvan- 
taged Head Start and non-Head Start children* C hild Developmant t 46 , 
481-486. 

Stevenson, H»W. (1965) t Social reinforcement of children's behavior. In 

L.P. Lipsitt & C.C# Spiker (Eds.), Advances in child development , II 

(pp. 97-126)t New York; Academic Press. 
Stevenson, H.Wi., & Hill, K.T. (1966). Use of rate as a measure of response 

in studies of social reinforcement. Psychological Bulletin , 66, 

321-326. 

Stoneman» Z. , & (UbHon, S. (JQ78). Situational influencos on assessment 

performance. Kxceptional Children , 46 , 166-169. 
Taylor, C. , i White, K.R. (1981, April). Effect?; of re nforcement and 

tr^lnlnp on Title I Htndent^^V ^^^"P st ^ndardi;^ed re>t performance . 

Papor prnsenttHl at the annual mvctinR of the Americm Fduc.itional Ho- 

stMPi-h Af?sorl at (on, Lus Angelt'S. 
Tihrr, fv. , Kenfu>iiy, W.A. {19h/0# The effects of incont Ivi-s on the 

i nt i» M { vfpiu-i' ii»st pt'i f ornanot* ot differt^nt sorlaJ j'.ronps. fournal of 

lyhr, Th I.' p s vc U o i o ^ry of h una n d 1 1 fe r e ncej^ . New York: 

Apph^f MO-Cpnt iirv-Cr^/t' CSp 

^f.p.^^r ^ f^, A h tTt 't't ^ r t f f'M, H^ ) hH )$ Mut ivrtt Iomh i c h k um^ ^ in 10 

ttst portormanoes of culturally deprived nursery Bchool children. Child 
De V t^ i npment , , 1-14. 

29 

ERIC 



I 

• ' '■ Test Procedure 

29 

Appendix 

22 Studies of Examiner Familiarity 
Uack, R.D.. & Dana, R.H. (1980). Self-help for male WISC examiners by 

pretest exposure to children. Perceptual and Motor Skills , 51 , 838. 
Costello, J. (1970). Effects of pretesting and examiner characteristics on 
test performance of young disadvantaged children [Summary I . Proceed- 
inf ^s of the 78th Annual Convention of the American Psychological Asso- 
ciation , 309. 

Duffy, O.B. (1972). The differential effects of psychologist as examiner 
and teacher as examiner on word recognition in oral reading of third 
and fifth grade children. Dissei ration Abstracts International , 33 , 
3375A. (University Microfilms No. 73-00-617.). 

DuRant, M.B. (1975). The effect of examiner familiarity on two sub-tests 
of the Illinois Test of t'sychollnqulst ic AblYties. Dissertation 
Abstracts International , 36, 3503A-3504A. (Uhiversity Microfilm^ No. 
7 5-28-968) 

^Feldman, S.E., & Sullivan, D.S. (1971). Factors mediating the effects of 
enhanced rapport on children's per/ormance. Journal of Consulting and 
CI inl eg 1 Psychology . 36, 302. 
Field, T. (1981). Ecological variables and examiner biases in assessing 
haijdlcapped preschool children. Journal of Pediatric Psychology , 6i, 

^ n^-ifti. 

Fu»:hs, D., Featherstone, N.L. , Garwick, D.R., & Fuchs, L.S. (198A), 

Effects of examiner familiarity and task characteristics on speech- 
and language-Impaired children's test performance. Measurement and 
Evaluation in Guidance , 16, 198-204. 

30 

ERIC 



BEST COPY AVAILABLE 



Test Procedure 
30 



Fuchs, D., Fuchs, L.S., Garwick, D.R., & Featherstone, N. (1983). Test 

performance of language-handicapped children with familiar and unfamil- 
iar examiners. Journal of Psychology , 114 , 37-46. 

Fuchs, D., Fuchs, L.S., Dalley, A.M., & Power, M.H. (in press). The effect 
.vf examiners' personal familiarity and professional experience on 
Handicapped children's test performance; A case of who, not what you 
know? Journal of Educational Research. 

Fuchs, D. , Fuchs, L.S., Power, M.H., & Dsiley, A.M. (in press). Bias in 
the cssessreent of handicapped children. Anerlcan Educationa.^ Research 

Jo'ttPd 1 . 

Urons, D. (1981). The i f feet of familiarity with thf? examiner on WISC-R 

Verba], Performance, and Full Scale scores. Psychology in the Schools , 
J^, 496-499. 

ti-obson, L.I., Heri^er, S.E., Bers^man, R.I., Millham, J., & flreeson, L.E. 
( IM;i). Ktfecrs of age, sex, Hysteraatlc conceptual learning, acquisi- 
tion of loarriifnj st>ts, and progr mmed social Interaction on the Intel- 
It .Mual ami oiici'ptiMl dctfp j.ipment of preschool children frotr poverty 
h u-k>it'»iHnis. Child 1 "[IfMIlf . > n99-l4l5. 

' innits K.I., & Sct'rnliif, R.K. (197{). Thf Influence of nnni ntoUectlvc? 
f.i.tois on llif sforcs of miildlf- and lower-cl.T^« chUdrt-n, Chi Id 
Dr^ vf ].)pm-nt , 4_2, 19H9~1995. 

KhMH, P.S. (19H'0. Cokinlrlv(? pcrrormanc*? of kindergarten children when 



L«?sct'<i by parents and .sLran^ers. In N. Nir-Janiv, B. Spodek , & D. Steg 
(KalH.), h:,irly childhood eduratton (pp. 429-440). New York: Plenum. 
•Kirln..', K.L. (1929). The effect of familiarity with the examiner upon 



31 



ERIC 




# 



Test Procedure 
31 

Stanford^Blnet test performance* Teachers College Contributions to Edu- 
cation , 381, entire Issue* 

Olswang, L.P' & Carpenter, R.L« (1978). Ellcitor effects on the language 
obtained Hum young language-Impaired children* Journal of Speech and 
Hearing Disorders , 43 , 76-88. 

0ro8t, J.H. (1972). Effects of examiner age and familiarity on test per- 
formance of third grade and kindergarten girls. Dissertation Abstracts 
International , 32, 6011A-6012A. (University Microfilms No. 72-16-092) 

PlerBel, W.C., Brody, G.H., & Kratochwlll, T.R. (1977). A further examina- 
tion of motivational influences on disadvantaged minority group chil- 
dren's intelligence test performance. Child Development , 48 , 1142- 
1145. 

Sacks, E.L. (1952). Intelligence scores as a function of experimentally 

established social relationships between child and examiner. Journal 

of Abnormal and Social Psychology , 47, 354-358. 
Thomas, A., Hertzig, M.E. , Dryman, 1., & Fernandez, F. (1971). Examiner 

effect in IQ testing of Puerto Rlcan working-class children. American 

Journal of Orthopsychiatr y, 41, 809-821. 
Tsud/ukl, T., Hata, Y. , & Kuze, T. (1956). A study on the rapport between 

the examiner and the subject. Japanese Journal of Psychology , 27 , 

22-28. 

Zlgler, E., Abelson, W.D. , & Seitz, V. (1973). Motivational factors in the 
performance of economically disadvantaged children on the PeaBoJy 
Picture Vocabulary Test. Child Development , 44 , 294-303. 



ERIC 



32 



Test Procedure 
32 

Footnotes 

^Additional information on the Back and Dana, Feldnan and Sullivan, and 
Irons published studies was obtained from the following fugitive sources: 
Back, R.D. , & Dana, R.H. (1980), The effects of pretest exposure on sex 
of examiner influence on the Wechsler Intelligence Scale for Children 
(Document NAPS-03775). Available from Microfiche Publications, PO Box 3513 
Grand Central Station, New York, NY 10017; Feldman, S.E. , & Sullivan, 
D.S. (n.d.). Factors influencing the effects of enhanced rapport upo n 
children's test performance. Unpublished manuscript. Northern Illinois 
Unlvi^rsity, DeKalb; Irons, D.A. (1980). The effect of familiarity with 
th»' examiner on WISC-R Verbal, Performance, and Full Scale scores (Doctoral 
(1 IsserLHt Ion, Texas Tech University). Diasertation Abstracts International , 
41, nnA. 



33 



ERIC 



33 



Table 1 

Means, Standard Deviations, and Correlations of UESs 
by Substantive Features of the Studies 



Substantive feature 


X 


SD 


N 


r 


Duration of familiarity 






36 


y ■»** 
.4 7 


Less than 16 minutes 


.09 


.62 


7 




Between 16 and 120 minutes 


.13 


.13 


11 




Between 121 minutes and 10 hours 


.62 


.41 


8 




flci^ijiP0r> I 1 AnH 9n hours 


.7 S 


-4ft 


1 








• J u 


7 




Examiners* professional familiarity 










with subject type^ 






21 


.06 


Fami liar 


.lb 


.37 


20 




Unf ami liar 


.17 





1 




Examiners ' Training 






11 


.20 


Professionally trained 


.31 


.32 


3 




Professionally untrained 


.06 


.52 


8 




Familiarity-inducing activity^ 






38 


.08 


Interaction 


.35 


.4 7 


37 




Observation 


.58 





1 










J w 




'*£inu iCeippcu 




% 7 


1 1 
1 1 




Nonhandicapped 


.39 


.51 


25 




Subjects' CA^ 






38 


.21 


Subjects' SES 






37 


-.40** 


Low 


.53 


.50 


17 




High 


.24 


.40 


20 




Test locat ion 






15 


.19 


Fiimi 1 i ar 


.26 


.34 


13 




Un f ami liar 


.4 3 


.17 


2 




Type of test 






38 


-.33* 




.54 


.54 


18 




Spt:ech/ language 


.19 


.35 


18 




Isolated tasks 


.24 


.19 


2 





Given the d istrlbut ioii uf t^ftt^ct sL^es across values of these variables, 
rh^^ ri^i^h>>,< r .^rr>> 1 if 1 g f t> lik(^ly to be uttstable. The same may be true 
for other variables such as Test Loc<Uion. 



^Since subjects' CA was treated as a continuous variable, there are no group 
means to reports 

< .0 5. 

*£ < .01. 

■^4 



Test Procedure 
34 



Table 2 

Results of Multiple Regression on Predicting UESs 



Sour«-e 


Multiple 
R 


r2 

Cumulative 


r2 
Change 


pa 




SKS 


.A7 




.22 


10.37** 


10.37** 


l\irn ton 


.S5 


.30 


.08 


7.61** 


3.99* 


CA 


.61 


.l? 


.07 


6.66** 


3.62* 


Tv'pf )f teat 




.A 2 


.0 3 


5.91** 


2.69* 



v.ilup 1« for ihe r^v^resslon e({u:ation. 
b£ v.iliie is lor lfu> contribution of i»arh varlablt?* 
< .OS, 



35 



